Reproduced from PCA using Python (scikit-learn)
One of the most important applications of PCA is for speeding up machine learning algorithms. The MNIST database of handwritten digits is more suitable as it has 784 feature columns (784 dimensions), a training set of 60,000 examples, and a test set of 10,000 examples.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px #if you don't have this, install "pip install plotly"
#model validation
from sklearn.model_selection import train_test_split
# PCA
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')
mnist.data.shape
(70000, 784)
mnist.data.head()
| pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | pixel10 | ... | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | pixel784 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 784 columns
mnist.target.head()
0 5 1 0 2 4 3 1 4 9 Name: class, dtype: category Categories (10, object): ['0', '1', '2', '3', ..., '6', '7', '8', '9']
The images that you downloaded are contained in mnist.data and has a shape of (70000, 784) meaning there are 70,000 images with 784 dimensions (784 features). The labels (the integers 0–9) are contained in mnist.target. The features are 784 dimensional (28 x 28 images) and the labels are simply numbers from 0–9.
Predicting the numbers from 0 to 9.

X_train, X_test, y_train, y_test = train_test_split( mnist.data, mnist.target, test_size=1/7.0, random_state=0)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60000, 784) (60000,) (10000, 784) (10000,)
# standardizing X features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_scaled[:2]
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
Notice the code below has .95 for the number of components parameter. It means that scikit-learn choose the minimum number of principal components such that 95% of the variance is retained.
from sklearn.decomposition import PCA
# Make an instance of the Model
pca = PCA(n_components=0.95)
pca
PCA(n_components=0.95)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
PCA(n_components=0.95)
# fit_transform X_train_scaled
X_train_pca = pca.fit_transform(X_train_scaled)
# Transform X_test_scaled
X_test_pca = pca.transform(X_test_scaled)
X_train_pca[:2]
array([[-3.69763425e+00, 9.66873129e+00, -1.90737826e+00,
-1.22670223e-01, 4.41010917e-02, 2.98328839e-01,
3.86299273e+00, 2.51682777e+00, 6.44844123e-01,
5.41619883e-01, 3.84999461e+00, 2.91186643e+00,
-5.05153823e-01, 2.44849949e+00, -3.02270936e+00,
-7.92215630e-01, -1.32816890e+00, -3.59872442e+00,
-1.70405773e+00, -1.91267182e+00, -1.25668348e+00,
8.75168380e-01, -1.91217173e+00, -2.89178760e+00,
-6.22917826e-02, 4.97582943e-01, -7.14058797e-01,
-7.39717894e-01, 6.12933833e-01, 5.36713209e-01,
5.00520029e-01, 2.58296531e+00, -1.92330147e+00,
1.08235587e+00, 1.92275199e+00, -1.15752056e-01,
1.20054950e+00, 9.80943366e-01, 5.37792412e+00,
1.41227982e+00, -2.24208679e-01, -5.83848506e-01,
-4.29248691e-01, 6.77860026e-01, 3.02350882e-02,
-7.68628687e-01, -8.29506788e-02, 1.05350508e+00,
-1.58763756e+00, 1.85872520e-01, -7.20610725e-01,
-5.79762108e-01, 3.53868553e+00, -4.18318009e-02,
1.27255296e+00, 6.71879440e-01, -5.01685632e-01,
8.16094272e-01, 1.27147719e+00, 3.98537690e-01,
1.84670491e+00, -4.91293526e-01, -7.10745662e-01,
-2.32278400e+00, -9.33476791e-01, -7.03950887e-01,
7.86131063e-01, -3.98334305e-01, 1.26975589e+00,
-1.29746355e-01, -1.25002766e+00, 5.00394913e-01,
-9.18535357e-02, 3.75441709e-01, -8.61118722e-01,
8.53854364e-02, 1.01041460e-01, 1.05489514e+00,
-9.17144208e-01, -5.33112522e-02, 3.85633128e-01,
5.05493087e-01, -9.70612336e-01, 3.97882654e-02,
-8.70121309e-01, 5.52238920e-01, 6.42248788e-01,
1.86629164e-01, 4.11115005e-01, -1.80872451e+00,
-9.97270428e-01, 1.40651663e+00, 1.09438961e+00,
5.67033244e-01, -2.41621416e+00, 8.32614329e-01,
-2.80307451e-01, -8.95518212e-01, -7.58902803e-01,
9.38032711e-01, -1.63971751e+00, -7.01887770e-01,
-1.30535167e+00, 1.38162250e+00, 2.53087269e-03,
8.38859168e-03, 3.45930253e-01, 2.27383500e-01,
-5.97345324e-01, 6.59823300e-01, 6.14733534e-01,
6.78958193e-01, -2.11998522e-01, 4.57010506e-01,
2.54041854e-01, -2.58157433e-01, 7.52776565e-01,
-5.89174611e-01, 1.01494437e+00, 1.25212894e+00,
1.20114463e+00, 1.57510697e+00, -3.14391458e-01,
-8.46679571e-01, 2.11994129e-01, -2.53670677e-01,
1.89062982e+00, 1.18852234e+00, -8.26606387e-01,
1.84072021e+00, 3.72221551e-01, 1.59772133e+00,
-1.34195716e-01, -3.84600816e-01, 3.74850424e-01,
-1.92923426e-01, 3.17048192e-01, 4.33978052e-01,
7.40924889e-02, -2.13572807e-01, -8.18561106e-01,
-9.47547651e-01, -1.27718173e-01, 7.38452413e-01,
8.19264011e-02, 1.27139902e+00, 2.99632772e-01,
3.87116082e-01, -2.56949415e-01, -5.58082610e-01,
5.84717319e-01, -3.81235443e-01, -2.36741599e-01,
7.73636924e-01, -1.19058723e+00, 9.96191645e-01,
-4.19727249e-01, -4.13422219e-01, 2.03904406e-01,
-6.18053895e-01, -6.29408308e-03, 1.87035597e-01,
-4.65746884e-01, 3.92027753e-01, -5.99957163e-01,
1.12353399e+00, -9.34426878e-02, -3.84795106e-01,
3.92248789e-01, 3.12060430e-01, -4.93540307e-01,
-3.28514925e-01, -1.33236152e+00, -2.11911795e-02,
-5.21907920e-02, 1.81688199e-01, -6.11998915e-01,
-1.37675331e-02, 3.66736187e-01, 1.69483495e+00,
3.58042538e-01, 9.15914185e-02, 4.54110926e-01,
-3.01956357e-01, 5.41002312e-01, 7.48957594e-02,
7.22913541e-01, -4.17154684e-01, -4.97951292e-01,
-4.52447259e-01, -8.28648605e-03, -5.49981448e-01,
1.55790398e-01, 1.40559885e+00, -3.04013740e-01,
-5.59478645e-01, -2.98812364e-01, 7.49180716e-01,
-5.82953885e-01, 5.48395485e-01, 3.51077373e-01,
-3.00096431e-01, 3.54962341e-01, -7.04468564e-01,
-3.38453356e-01, -1.02500213e+00, -7.32388382e-02,
-3.26750499e-01, 2.26523175e-01, -3.25279284e-01,
-3.71359512e-01, -4.71265647e-01, -3.98457841e-01,
-6.18020838e-01, -1.47797409e-01, -5.64329533e-01,
6.91288566e-01, 9.25511021e-01, 6.99112364e-01,
-2.49128229e-01, -3.90983946e-01, -2.91213427e-02,
-1.23602751e-01, 1.15017006e-01, -3.43353461e-01,
3.01568876e-02, -5.71264681e-01, 1.16832437e+00,
-1.52287439e-01, -5.61048581e-01, -4.80145744e-01,
2.49862197e-01, 6.89454240e-01, 7.58611967e-01,
7.19526115e-01, 2.04154692e-01, -1.06929660e-01,
-4.39690422e-02, -1.12783291e-02, 5.52197472e-01,
1.30518936e-01, -8.88479849e-02, -4.43622483e-01,
-1.04825141e+00, -5.95393443e-01, 1.91938942e-01,
1.10488829e+00, -4.71001204e-01, -5.12447433e-01,
6.63320318e-01, -5.04613681e-01, 1.00198952e+00,
-9.43083103e-02, 4.94716286e-01, 7.54437410e-01,
-9.64151600e-02, -6.82325745e-01, -5.94185347e-01,
2.45722963e-01, -9.09393146e-01, -1.01393210e-01,
1.73005250e-01, -5.06936674e-02, -4.49378234e-01,
-1.86476740e-01, 1.71900343e-01, 6.06925364e-01,
2.86472365e-01, -1.99174070e-01, -3.62916074e-02,
6.29023764e-01, -5.82448068e-01, 2.52705127e-01,
-1.99046569e-01, 7.83733156e-01, -1.65817633e-01,
1.12197336e-02, -3.19875487e-01, 2.26507500e-01,
3.52379176e-01, 3.35868411e-01, -7.73696672e-01,
6.88901318e-01, 4.00213907e-01, 3.47667911e-01,
1.34262929e-01, 5.38501839e-01, 2.68913388e-01,
-1.93760942e-02, 1.73380635e-01, -6.63154946e-01,
1.00290172e-01, -8.69539051e-01, 8.14909039e-01,
1.51486658e-01, 5.34396553e-01, 2.48565308e-01,
-2.33118576e-01, 4.14477180e-01, -3.25900809e-01,
6.18608220e-02, 3.22675042e-01, -2.61450815e-01,
5.02124426e-01, 1.30123428e+00, 1.23286309e-01,
1.79179343e-01, -1.36958336e+00, 2.80910063e-01,
-3.52841936e-02, -4.35143375e-01, 4.51586247e-01,
-1.96588261e-01, -5.25249810e-01, 9.08694760e-01,
-3.83513231e-01, 4.43919496e-01, 5.54583333e-01,
-6.92309914e-02, -3.01448769e-02, -3.60625023e-01,
-2.03582179e-01, 4.71459001e-02, 1.63980394e-01,
1.83551695e-01, -2.81001667e-01, -1.84475487e-02],
[-1.07779826e+00, -8.77945861e-01, 4.44723208e+00,
-2.77391332e+00, -1.16545482e+00, -6.32137404e+00,
-1.71336991e-01, 6.82931785e+00, 5.32877776e-01,
3.55879634e+00, 6.46301177e-01, 6.41851358e-01,
-8.06364739e-02, -1.11864848e+00, 2.41509750e+00,
-2.72656500e+00, 2.42922580e+00, -2.92563961e+00,
1.84466398e+00, 1.66633750e+00, 1.57389052e-01,
1.49147942e+00, 9.76808817e-01, -1.42422748e+00,
-1.00422598e+00, -1.21344648e-01, 1.77558131e+00,
-1.42563110e+00, -2.66096376e-02, -2.46661446e+00,
-1.00086878e+00, -2.04432037e-02, -7.85984742e-01,
-1.44137453e+00, -4.08563038e-01, 2.53610523e+00,
3.72478329e-02, 3.16006372e-01, 6.01300006e-01,
-1.49875136e+00, -3.47558937e-02, 3.16231288e-01,
6.53996650e-01, 1.02985467e+00, 1.77956641e+00,
-1.01527374e+00, 6.81344094e-01, -2.90201486e-01,
6.90438861e-01, -1.78693921e+00, 1.39116202e+00,
-4.93238188e-01, 8.88235645e-01, -9.18497048e-01,
6.97789307e-02, -2.56416665e+00, -1.34936484e+00,
8.06544278e-02, -7.47587716e-01, -6.38759124e-01,
9.27740978e-01, -4.79495626e-01, -2.40834883e+00,
-1.53336973e+00, 3.47415800e-01, -1.32958814e+00,
1.70656224e+00, 7.57479476e-01, -1.45234542e-02,
-2.22994173e-01, 1.25185073e+00, 5.64711644e-02,
1.42704950e+00, -3.36147530e-01, 2.87309281e-01,
1.21252918e+00, -7.53265785e-01, -2.87774681e-01,
1.46600249e+00, 6.02616026e-01, -7.43238937e-01,
-5.26073379e-02, 5.84795695e-01, -4.49227160e-01,
-4.74868132e-02, 4.07550825e-01, -3.51069620e-01,
1.36148292e-01, 3.02680567e-01, 2.46694493e-01,
-6.12938344e-01, 4.67901331e-01, 2.28483727e-01,
-6.79825494e-01, -3.56131066e-01, 1.23329448e-01,
-2.60719086e-01, 9.10504563e-02, -5.67256620e-01,
-7.45005540e-02, -7.53397668e-02, 3.06134448e-01,
9.89114007e-01, -1.27886211e+00, -1.09616897e+00,
-1.04454073e+00, -8.58536051e-01, -3.39410036e-01,
-2.16525047e-01, -9.47975688e-01, -6.19065955e-02,
7.13206979e-02, 3.60131312e-01, -5.94026566e-02,
9.95787014e-01, -1.94097484e-02, -5.23709708e-01,
-3.29177280e-01, 9.48508579e-02, -9.64972775e-01,
-1.11370625e-01, 2.08575948e-01, 8.96528568e-01,
5.69140416e-01, 5.63926737e-01, -4.10251047e-02,
-9.40250902e-02, 1.26149991e+00, 2.60175762e-01,
-4.00966461e-01, -9.25326763e-01, 1.62721154e-01,
-8.43869406e-02, -5.03711478e-01, -1.72489867e-01,
-2.81322831e-01, 7.01745570e-02, 1.14223970e+00,
-6.20924255e-01, -3.39714598e-01, 2.71843592e-01,
-1.10850926e+00, -7.84634021e-02, 4.48970201e-01,
-1.19851621e-01, -9.69932235e-02, -7.69166106e-02,
-1.49409132e-01, 1.45702937e-01, 5.40168256e-02,
5.99550060e-01, -3.05152002e-01, -1.77562768e-01,
7.81935007e-01, -3.42807093e-01, -3.50230181e-01,
-5.66661461e-01, 2.45903422e-01, -6.94091296e-01,
9.78381861e-02, 3.84246653e-02, 6.06494638e-02,
-2.66094512e-02, 3.11253556e-02, -3.44250149e-01,
4.00196416e-01, -8.22437165e-02, 4.90936149e-01,
-6.50422936e-01, -3.41364250e-02, -1.03807163e+00,
1.74804783e-01, 3.14520770e-01, -1.53543332e+00,
6.19277211e-01, -1.92238051e-01, 4.27282910e-02,
-1.48656662e-01, 5.29390960e-01, 1.16165712e+00,
9.73005571e-01, -2.68404237e-01, -2.68100634e-01,
1.29374143e-01, -4.38546256e-02, -3.05756349e-01,
3.27097765e-01, -4.31104185e-02, -5.50280889e-01,
4.65746139e-01, -4.80938304e-01, -1.99663688e-01,
2.38407837e-01, 4.32138006e-01, -5.12684528e-01,
3.27793632e-01, -5.20959386e-01, 2.27099932e-02,
-3.97447903e-01, -4.24002423e-01, -1.55295091e-02,
4.07072022e-01, 6.95452549e-01, -1.85405259e-01,
-4.55870419e-01, -5.85169871e-02, -2.47808983e-01,
6.40285313e-01, 6.98139373e-01, -1.91535783e-01,
5.02821115e-01, 3.74391479e-01, 5.01994602e-01,
-1.22214115e-01, 7.06290586e-02, -3.89064327e-01,
1.64498892e-01, -2.30673979e-01, 1.75035594e-01,
-1.98929197e-01, 1.69619116e-01, -7.19352873e-02,
-1.77569204e-01, -2.56335139e-01, 1.86818265e-01,
2.81601585e-03, 1.26098805e-01, -4.64418935e-01,
2.43027145e-01, -4.36579803e-01, 3.35214372e-01,
-4.95290205e-01, -5.44834314e-02, -6.19935493e-01,
-1.31821214e-01, 2.39386316e-01, -7.66901494e-01,
3.29733761e-01, 8.80970106e-01, 1.76575241e-01,
-2.27257828e-01, -9.24852122e-03, -8.68463327e-01,
1.68204999e-01, -2.16721607e-01, 2.38807154e-01,
4.15110412e-01, 1.83188157e-01, 5.88974588e-01,
-2.72385556e-01, -7.47886258e-01, -3.49193394e-01,
5.77971878e-01, -2.33229339e-01, -2.86429735e-01,
4.21070813e-01, -2.38266464e-01, 1.71355348e-01,
4.42613313e-01, -5.28301742e-01, 2.04770549e-01,
-1.71280596e-01, 2.28879024e-01, -5.53069879e-02,
-7.11703493e-01, -4.17441724e-01, 2.66669184e-01,
5.31896736e-01, -4.69062414e-01, -1.84294909e-01,
-3.89217139e-01, -1.67179282e-02, -2.87548851e-01,
5.62572893e-01, 2.37056355e-01, 4.95018721e-01,
1.83273571e-01, 1.89473658e-01, -2.68462667e-01,
-2.60569068e-01, 4.94808808e-01, 4.17521758e-01,
-3.28425151e-02, -2.91730444e-02, 3.39877980e-02,
1.22041825e-01, -3.35084344e-02, 5.11646093e-01,
-1.61283430e-01, 3.16397936e-01, -2.14923027e-01,
4.27158672e-02, -5.57453393e-01, 4.25412544e-01,
-1.20992503e-01, -4.01290217e-01, -3.30216390e-01,
1.44209284e-01, 6.54878719e-02, 5.73406279e-01,
8.81378258e-02, 1.72962775e-01, -4.25920768e-01,
7.10080989e-02, -3.38467222e-02, 1.12092468e-02,
-3.05667690e-01, -6.98433523e-01, -2.38389939e-01,
-6.24484984e-01, -6.35919541e-01, -3.88307573e-01,
4.48825666e-01, 7.61666914e-01, -9.70046057e-02,
2.28635261e-01, -2.78969237e-02, 1.50824218e-01,
-8.38957718e-02, -4.62220199e-01, 9.58914430e-03,
3.06971136e-01, 2.77155055e-01, 1.42789759e-01,
3.27869318e-01, -1.61930473e-01, -1.44981460e-01]])
# Check the number of components that have been created
pca.n_components_
327
# explained variance of each component
print(pca.explained_variance_ratio_)
[0.05685361 0.04063911 0.03763558 0.02922128 0.02528914 0.02204721 0.01928718 0.01757131 0.01540722 0.01405428 0.01350938 0.01211403 0.01119986 0.01098358 0.01033802 0.01003988 0.00936712 0.00925502 0.00896537 0.00872971 0.00829055 0.00803682 0.00768332 0.00745709 0.00721007 0.00696055 0.0068921 0.00665777 0.00632132 0.00617711 0.0060383 0.00592983 0.00572632 0.00570209 0.00566881 0.00558128 0.00537305 0.00532854 0.00518324 0.00511526 0.00486119 0.00478035 0.00475297 0.0045825 0.00449905 0.004486 0.00442056 0.0044048 0.00432475 0.00430896 0.00417871 0.00403848 0.00401904 0.00392742 0.00390669 0.00387445 0.00375632 0.00371195 0.00368406 0.00363915 0.00354918 0.00348922 0.0034752 0.00344278 0.00340872 0.0033679 0.00331969 0.0032352 0.00320347 0.0031613 0.00312554 0.00309227 0.00307597 0.00304839 0.00302824 0.00298684 0.00296713 0.00293995 0.00291154 0.00289588 0.00285848 0.0028449 0.00281628 0.00280938 0.00280373 0.00279133 0.00276583 0.00274974 0.00270938 0.0026828 0.00266616 0.00263939 0.00262498 0.00261941 0.00259811 0.00253579 0.00252765 0.00251258 0.00247512 0.00245186 0.00241783 0.00239157 0.0023698 0.0023236 0.0023037 0.00226846 0.00225265 0.00223827 0.0022208 0.0021785 0.0021765 0.0021626 0.00214873 0.00212475 0.00207814 0.00206356 0.00205832 0.00203387 0.00202009 0.0019781 0.00196711 0.00195063 0.00192362 0.00192114 0.0019056 0.0018947 0.00187839 0.00186145 0.00184124 0.00181677 0.00179847 0.00178707 0.00176135 0.00175645 0.00171728 0.00168586 0.00168432 0.00167564 0.0016547 0.00165256 0.0016285 0.00161405 0.00159241 0.00157637 0.00157156 0.00155893 0.00155016 0.00154756 0.00152067 0.00150921 0.00149016 0.00148562 0.00147618 0.00146899 0.00144641 0.00144071 0.00143433 0.00142855 0.00141384 0.00141068 0.00140292 0.00140018 0.00139898 0.00139464 0.00137892 0.00137346 0.0013652 0.00135667 0.00135119 0.00134854 0.00133391 0.00131378 0.00130917 0.00130138 0.0012963 0.00127563 0.00127129 0.0012565 0.00125098 0.00122937 0.00121937 0.00121567 0.00120144 0.00119766 0.00118574 0.00117728 0.00116681 0.00115998 0.00115004 0.00113061 0.0011206 0.00111867 0.00111006 0.00108476 0.00107266 0.00106136 0.00105619 0.00104969 0.00104495 0.00103752 0.00102513 0.00100979 0.0010048 0.00098798 0.00098294 0.00097809 0.00097405 0.00097018 0.00095418 0.00094977 0.00094014 0.00093501 0.00091285 0.00090861 0.00090083 0.00089101 0.00088459 0.00087588 0.00087477 0.00086066 0.00085789 0.00084811 0.00084077 0.00083403 0.00082954 0.00082379 0.00081009 0.00080765 0.00079963 0.00079619 0.00078925 0.00078487 0.00078071 0.00077177 0.00076283 0.00075984 0.0007443 0.00074245 0.00073157 0.0007292 0.00072189 0.00070632 0.00070113 0.00070024 0.00069278 0.00068906 0.00068798 0.00068534 0.00067886 0.00066675 0.0006589 0.00065724 0.00065009 0.00064184 0.00063892 0.00063609 0.00062879 0.00062614 0.00062108 0.00061539 0.00061137 0.00060608 0.00060376 0.00059921 0.00059009 0.00058542 0.00058427 0.00057712 0.00057365 0.00056997 0.00056079 0.00055679 0.00055437 0.00055061 0.00054566 0.00054137 0.00054078 0.00053512 0.0005325 0.00052606 0.00051768 0.00051541 0.00051239 0.00050887 0.00050629 0.0005035 0.000501 0.00049528 0.00049104 0.00048658 0.00048374 0.00048201 0.00047849 0.0004676 0.00046477 0.00046048 0.00045276 0.00044635 0.00044339 0.00044194 0.00043658 0.00043598 0.00043443 0.00043282 0.0004252 0.00042188 0.00041866 0.00041698 0.00041243 0.00041144 0.00040687 0.0004032 0.00039699 0.00039519 0.00039201 0.0003893 0.00038551 0.00038304 0.00038233 0.00037827 0.00037664 0.0003738 0.00037281 0.00036816 0.00036678 0.00036217 0.00036144]
# cumulative sum
print(pca.explained_variance_ratio_.cumsum()) # cumulative sum
[0.05685361 0.09749272 0.1351283 0.16434958 0.18963872 0.21168593 0.2309731 0.24854441 0.26395164 0.27800592 0.2915153 0.30362934 0.3148292 0.32581277 0.33615079 0.34619067 0.35555779 0.36481281 0.37377818 0.38250789 0.39079844 0.39883526 0.40651858 0.41397567 0.42118575 0.4281463 0.4350384 0.44169618 0.4480175 0.45419461 0.46023291 0.46616274 0.47188906 0.47759115 0.48325996 0.48884124 0.49421429 0.49954283 0.50472607 0.50984133 0.51470252 0.51948287 0.52423584 0.52881834 0.53331739 0.53780339 0.54222395 0.54662875 0.5509535 0.55526246 0.55944116 0.56347964 0.56749868 0.5714261 0.5753328 0.57920725 0.58296357 0.58667551 0.59035958 0.59399873 0.59754791 0.60103713 0.60451233 0.60795511 0.61136383 0.61473173 0.61805142 0.62128662 0.62449009 0.62765139 0.63077693 0.63386919 0.63694516 0.63999355 0.64302179 0.64600863 0.64897576 0.65191571 0.65482725 0.65772313 0.66058161 0.66342651 0.6662428 0.66905217 0.67185591 0.67464724 0.67741307 0.68016281 0.68287218 0.68555498 0.68822115 0.69086054 0.69348552 0.69610492 0.69870303 0.70123882 0.70376647 0.70627905 0.70875416 0.71120603 0.71362385 0.71601542 0.71838522 0.72070882 0.72301252 0.72528098 0.72753363 0.7297719 0.7319927 0.7341712 0.7363477 0.73851029 0.74065902 0.74278378 0.74486192 0.74692548 0.74898381 0.75101768 0.75303776 0.75501586 0.75698297 0.7589336 0.76085722 0.76277836 0.76468396 0.76657867 0.76845706 0.77031851 0.77215976 0.77397652 0.77577499 0.77756206 0.77932341 0.78107986 0.78279714 0.784483 0.78616731 0.78784295 0.78949765 0.79115021 0.79277871 0.79439276 0.79598518 0.79756154 0.7991331 0.80069203 0.80224219 0.80378975 0.80531042 0.80681963 0.80830979 0.80979542 0.8112716 0.81274059 0.814187 0.81562771 0.81706204 0.8184906 0.81990444 0.82131512 0.82271804 0.82411822 0.8255172 0.82691183 0.82829075 0.82966421 0.8310294 0.83238608 0.83373727 0.83508581 0.83641972 0.8377335 0.83904267 0.84034405 0.84164035 0.84291598 0.84418727 0.84544377 0.84669475 0.84792412 0.84914349 0.85035916 0.85156061 0.85275827 0.85394401 0.85512129 0.8562881 0.85744809 0.85859812 0.85972873 0.86084933 0.861968 0.86307806 0.86416282 0.86523548 0.86629684 0.86735303 0.86840273 0.86944767 0.87048519 0.87151032 0.87252011 0.87352492 0.8745129 0.87549584 0.87647393 0.87744798 0.87841816 0.87937234 0.88032211 0.88126225 0.88219726 0.88311011 0.88401872 0.88491955 0.88581056 0.88669515 0.88757103 0.8884458 0.88930647 0.89016436 0.89101247 0.89185324 0.89268727 0.89351681 0.8943406 0.89515069 0.89595835 0.89675797 0.89755417 0.89834341 0.89912829 0.899909 0.90068077 0.90144361 0.90220344 0.90294775 0.9036902 0.90442177 0.90515097 0.90587285 0.90657917 0.9072803 0.90798055 0.90867332 0.90936239 0.91005036 0.91073571 0.91141456 0.91208132 0.91274022 0.91339746 0.91404755 0.91468938 0.9153283 0.9159644 0.91659319 0.91721933 0.9178404 0.91845579 0.91906716 0.91967323 0.920277 0.92087621 0.9214663 0.92205172 0.92263599 0.92321311 0.92378677 0.92435674 0.92491752 0.92547431 0.92602868 0.92657929 0.92712496 0.92766633 0.9282071 0.92874222 0.92927472 0.92980078 0.93031846 0.93083387 0.93134626 0.93185513 0.93236142 0.93286493 0.93336592 0.93386121 0.93435225 0.93483883 0.93532257 0.93580459 0.93628308 0.93675067 0.93721545 0.93767593 0.93812869 0.93857504 0.93901843 0.93946037 0.93989695 0.94033294 0.94076737 0.94120019 0.94162539 0.94204727 0.94246593 0.94288291 0.94329533 0.94370677 0.94411364 0.94451684 0.94491383 0.94530903 0.94570104 0.94609034 0.94647585 0.94685889 0.94724122 0.94761949 0.94799613 0.94836993 0.94874274 0.9491109 0.94947768 0.94983985 0.95020129]
exp_var_cumul = np.cumsum(pca.explained_variance_ratio_)
total_var = pca.explained_variance_ratio_.sum() * 100
px.area(
x=range(1, exp_var_cumul.shape[0] + 1),
y=exp_var_cumul,
title=f'Total Explained Variance: {total_var:.2f}%',
labels={"x": "# Components", "y": "Explained Variance"}
)
# initialize LogisticRegression algorithm
logisticRegr = LogisticRegression(solver = 'lbfgs', max_iter=2000, random_state=0)
# fit
logisticRegr.fit(X_train_pca, y_train)
LogisticRegression(max_iter=2000, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression(max_iter=2000, random_state=0)
# model's overall accuray
print(metrics.accuracy_score(y_test, logisticRegr.predict(X_test_pca)))
0.9185
# model's confusion matrix
print(metrics.confusion_matrix(y_test, logisticRegr.predict(X_test_pca)))
[[ 965 0 2 3 1 10 10 0 4 1] [ 0 1106 11 1 1 7 0 4 9 2] [ 3 14 932 19 13 4 13 12 27 3] [ 2 7 39 889 1 29 1 12 20 13] [ 1 3 8 0 900 0 11 8 4 27] [ 7 2 10 29 7 759 14 3 27 5] [ 8 3 9 0 14 14 935 1 5 0] [ 4 5 16 2 12 5 0 976 5 39] [ 3 19 9 19 7 26 6 2 858 14] [ 4 5 4 11 30 9 1 33 7 865]]
import scikitplot as skplt
skplt.metrics.plot_confusion_matrix(y_true=y_test,
y_pred=logisticRegr.predict(X_test_pca))
plt.show()
The whole point of this section of the tutorial was to show that you can use PCA to speed up the fitting of machine learning algorithms. The table below shows how long it took to fit logistic regression on the author's MacBook after using PCA (retaining different amounts of variance each time).

import sklearn
print(sklearn.__version__)
1.4.1.post1